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Abstract 

Background: NanoString's Prosigna™ Breast Cancer Prognostic Gene Signature Assay is based on the PAM50 gene 
expression signature. The test outputs a risk of recurrence (ROR) score, risk category, and intrinsic subtype (Luminal 
A/B, HER2-enriched, Basal-like). The studies described here were designed to validate the analytical performance of 
the test on the nCounter Analysis System across multiple laboratories. 

Methods: Analytical precision was measured by testing five breast tumor RNA samples across 3 sites. Reproducibility 
was measured by testing replicate tissue sections from 43 FFPE breast tumor blocks across 3 sites following independent 
pathology review at each site. The RNA input range was validated by comparing assay results at the extremes of the 
specified range to the nominal RNA input level. Interference was evaluated by including non-tumor tissue into the test. 

Results: The measured standard deviation (SD) was less than 1 ROR unit within the analytical precision study and the 
measured total SD was 2.9 ROR units within the reproducibility study. The ROR scores for RNA inputs at the extremes 
of the range were the same as those at the nominal input level. Assay results were stable in the presence of moderate 
amounts of surrounding non-tumor tissue (<70% by area). 

Conclusions: The analytical performance of NanoString's Prosigna assay has been validated using FFPE breast tumor 
specimens across multiple clinical testing laboratories. 

Keywords: PAM50, Analytical validation, ROR, Subtype, Breast cancer, Prosigna, NanoString, nCounter, Reproducibility, 
FFPE, Gene expression 




Cancer 



Background 

Molecular biomarkers have played an increasingly im- 
portant role in identifying cancer patients with different 
prognostic outcomes and in predicting response to 
chemotherapy [1-3]. Molecular assays targeting these 
biomarkers are now routinely performed in local path- 
ology labs to help guide treatment decisions in breast 
cancer [4,5], lung cancer [6], and colorectal cancer [7]. 
Gene expression analysis has helped identify distinct 
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molecular signatures in breast cancer that have different 
prognostic outcomes [8-10]. Multigene assays targeting 
21-70 genes are now routinely used in clinical practice 
to assess risk of recurrence in early stage breast cancer 
[11,12], and prospective clinical trials are also underway 
to provide further supporting evidence for the clinical 
utility of these assays [13,14]. To date, breast cancer 
multigene clinical assays have been largely limited to 
central reference laboratories due to the complexity of 
performing the test. Ultimately, development of assays 
with a simplified workflow is required to move these 
multigene expression tests into the local pathology lab 
setting, where efficiencies such as shorter turnaround 
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time and direct interaction between laboratory physi- 
cians and the clinicians will benefit active patient care. 

The PAM50 gene signature measures the expression 
levels of 50 genes in a surgically resected breast cancer 
sample to classify a tumor as one of four intrinsic subtypes 
(Luminal A, Luminal B, HER2-enriched, and Basal-like) 
[15], which have been shown to be prognostic in both un- 
treated (i.e. no adjuvant systemic therapy) and tamoxifen 
treated patient populations [15,16]. In addition to identify- 
ing a tumor's intrinsic subtype, the PAM50 signature gen- 
erates an individualized score estimating a patient's 
probability of disease recurrence by weighting the molecu- 
lar subtype correlations, a subset of proliferation genes, 
and pathologic tumor size [15,16]. The PAM50 test was 
adapted to be performed using the nCounter Analysis Sys- 
tem in order to develop a simplified workflow that could 
be performed in a local pathology lab (Prosigna'" Breast 
Cancer Gene Signature Assay, NanoString Technologies, 
Seattle). This technology uses multiplexed gene-specific 
fluorescently- labeled probe pairs [17] to measure gene ex- 
pression in frozen or formalin-fixed paraffin-embedded 
(FFPE) tissues with equivalent ease and efficiency [18]. A 
recent clinical validation performed on RNA extracted 
from over 1000 FFPE tumor specimens from the ATAC 
clinical trial demonstrated that the Prosigna risk of recur- 
rence (ROR) score, based on the PAM50 gene expression 
signature, added significant prognostic information be- 
yond the Oncotype DX e Recurrence Score 8 in estimating 
the likelihood of distant recurrence in hormone receptor 
positive, post-menopausal breast cancer patients [19] 
treated with endocrine therapy alone. A second clinical 
validation study performed on over 1400 FFPE patient 
samples from the ABCSG-8 trial has independently 
confirmed the clinical validity and demonstrated add- 
itional prognostic value in node-positive patients and 
for the risk of late recurrence [20,21]. Based in part 
on the results from these clinical studies and the ana- 
lytical studies described herein, NanoString obtained a 
CE Mark for its Prosigna assay in 2012, followed by 
US Food and Drug Administration (FDA) clearance in 
September of 2013. 

Recently, requirements for demonstrating utility of a 
tumor biomarker were established that include not only 
clinical validity, but also analytical reproducibility and 
robustness [22,23]. The results of ATAC and ABCSG-8, 
including a follow up combined analysis of the two stud- 
ies [24] meet this high level of evidence (Level I) for 
clinical validity using archived specimens [22]. The stud- 
ies described herein were designed to test the analytical 
validity of decentralized use of the Prosigna assay across 
multiple clinical testing sites, following established 
guidelines [25]. These studies were also designed to val- 
idate procedures for training laboratory personnel to 
perform the Prosigna assay on the nCounter system. 



Methods 

NanoString Prosigna assay 

The tissue input for the Prosigna assay was FFPE tissue 
that had been previously diagnosed to contain viable in- 
vasive breast carcinoma. The breast tumor tissue must 
be classified by a pathologist as invasive carcinoma 
(ductal, lobular, mixed, or no special type). A pathologist 
reviews an H&E stain of a slide mounted tumor section 
to identify and circle the region of viable invasive breast 
carcinoma. The tumor surface area on the H&E stained 
section must be > 4 mm 2 per slide, with tumor cellular- 
ity > 10%. Non-tumor tissue from outside the circled 
area is removed by macrodissection of the correspond- 
ing unstained slides. RNA was extracted from slide 
mounted breast tissue sections using a RNA extraction 
kit manufactured by Roche to NanoString's specifica- 
tions [26]. For RNA isolation, a single 10-micron slide 
mounted tissue section was input for RNA extraction 
when the tumor surface area measured > 100 mm , 
whereas 3 slides were input when the tumor surface 
measured 4-99 mm 2 . Following extraction of total RNA 
and removal of genomic DNA, RNA was eluted (30 uL 
volume) and tested to ensure it met the specifications 
for concentration (> 12.5 ng/ uL) and purity (OD 260/ 
280 nm 1.7-2.5). 

The NanoString Prosigna assay [26] measures the ex- 
pression levels of 50 target genes plus eight constitu- 
tively expressed normalization genes [15,27,28]. Assay 
controls are included to ensure that test samples and the 
test process meet pre-defined quality thresholds. Ex- 
ogenous probes with no sequence homology to human 
RNA sequences are included as positive and negative 
assay controls. Positive controls are comprised of a six 
point linear titration of in vitro transcribed RNA cover- 
ing an approximately 1000 fold RNA concentration 
range (0.125 - 128 fM) and corresponding probes 
[29,30] . Negative controls consist of a set of probes with- 
out the corresponding targets. Each assay run includes 
two reference control samples comprising in vitro tran- 
scribed RNA of the 58 targets for qualification and 
normalization purposes. 

Extracted RNA samples meeting quality and concen- 
tration specifications were hybridized (without reverse 
transcription or amplification) to capture and reporter 
probes for the measured genes and assay controls. The 
multiplexed hybridizations are carried out in a single- 
tube for 15 - 21 hrs at 65°C using 125 - 500 ng RNA 
(nominal input of 250 ng). After hybridization, the 
target-probe complexes were processed on the nCounter 
Analysis System. Test sample data must meet a mini- 
mum threshold for expression of normalizing genes to 
ensure that the assay signal is high enough for the algo- 
rithm to produce precise results. The linearity of the 
positive control target titration and the non-specific 
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background from negative control probes included in 
each assay is used to determine whether each assay per- 
formed within specification. Since the test is designed to 
be run in local molecular pathology labs, all quality 
thresholds are applied automatically to the data by em- 
bedded software; any failing metric causes an assay fail- 
ure notice which prevents output of a Prosigna assay 
result. For samples meeting all quality thresholds, a clin- 
ically validated algorithm is used to determine the intrin- 
sic subtype and ROR score, which are prognostic 
indicators of risk of distant recurrence of breast cancer 
[19,21]. The normalized gene expression profile of each 
breast tumor sample is correlated to prototypical gene 
expression profiles of the four breast cancer intrinsic 
subtypes (Luminal A, Luminal B, HER2-enriched, and 
Basal-like). The primary tumor size (categorical input 
of < 2 cm or > 2 cm) and normalized gene expression 
profile of each breast tumor sample is used to calculate 
the numerical ROR score. Risk categories are assigned to 
allow interpretation of the ROR score by using pre- 
specified cutoffs (defined in a clinical validation study) 
related to risk of distant recurrence after 10 years [19]. 

Operators for these studies were required to undergo 
training procedures to demonstrate proficiency, equiva- 
lent to what will be used to train users in molecular 
pathology laboratories for the decentralized test. Each 
site was given an overview of the NanoString technology 
and Prosigna assay procedures followed by an in-lab ex- 
ercise where users were trained and qualified on tissue 
processing and assay procedures (requiring 10-12 hours 
of total hands-on time). Briefly, each user extracted RNA 
from three FFPE breast tumor tissue samples to demon- 
strate proficiency in tissue processing, and each user proc- 
essed four prototypical breast tumor RNA samples (one of 
each intrinsic subtype with known expected ROR score 
values) along with a negative control sample to demon- 
strate proficiency on the nCounter Analysis System. 

The analytical studies described herein were performed 
using pre-specified SOPs, statistical analysis plans and ac- 
ceptance criteria using clinical-grade reagents, instrumen- 
tation, and software formatted such that no comparison of 
results between test centers could even be possible until 
the study was completed. 

RNA precision: study design 

The RNA Precision study assessed the reproducibility of 
the Prosigna assay using a common template of purified 
RNA, thereby isolating the device-specific components 
of analytical validity from variables associated with tissue 
processing. The experimental design for analytically val- 
idating the precision of the assay from RNA was based 
on Clinical Laboratory and Standards Institute (CLSI) 
guidelines for the evaluation of precision of in vitro diag- 
nostic devices outlined in EP05-A2 [25]. This design 



measured the variability between and within a number 
of assay variables including testing site (n = 3), operator 
(n = 6), reagent lot (n = 3) and assay run (n = 18/site). Two 
of the three sites used were CLIA-certified, CAP-accredited 
laboratories at the British Columbia Cancer Agency 
(Vancouver), and Washington University (St. Louis); the 
third site was NanoString Technologies (Seattle). 

Five pooled breast tumor RNA samples were gener- 
ated from archived FFPE breast tumor tissue samples 
containing viable invasive breast carcinoma, to comprise 
a sample set representing each intrinsic breast cancer 
subtype and risk classification group (Table 1). Since the 
samples were pooled breast tumor RNA, a default tumor 
size category of < 2 cm was used to determine the esti- 
mated ROR score, and a default nodal status of node- 
negative was used to determine risk category. This design 
ensured that the prototypical gene expression profiles en- 
countered during routine testing were represented within 
this analytical validation study. Since Luminal subtypes 
make up the vast majority of the intended use population 
(hormone receptor positive patients), the study design 
included three Luminal samples to span the risk classifica- 
tion groups. The identity of each sample aliquot was de- 
identified using labeled sample tubes with unique, 
randomly assigned, barcoded IDs to ensure that the opera- 
tors were blinded to any possible expected results of each 
test sample. 

Single use aliquots of each pooled breast tumor RNA 
sample and three reagent lots were distributed to each 
of the three testing sites to complete the following test- 
ing scheme (Figure 1). Each of the five RNA pooled sam- 
ples was tested in duplicate during each run at the 
nominal RNA input level for the assay of 250 ng. The 
positions of the tumor RNA samples within the system 
(cartridge and strip tube position) were pre-assigned in a 
randomized and balanced manner for each run. Each 
operator completed one run on a given day since the 
assay includes an overnight hybridization step qualifying 
it as a "long run method" per CLSI EP05-A2. Following 
a device and study protocol familiarization run, each site 
completed 18 valid runs (9 by each operator) (Figure 1). 



Table 1 RNA precision study sample summary 


Intrinsic 


Estimated 


Risk 


subtype 


ROR score 


classification 


Luminal A 


30 


Low 


Luminal B #1 


5--1 


Intermediate 


Luminal B #2 


64 


High 


Basal-like 


55 


Intermediate 


HER2-enriched 


76 


High 



Molecular characteristics of the five pooled breast tumor RNA samples used in 
the RNA precision study. 
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Site 1 



Site 2 



Site 3 



£ 

Operator 1 



1 

Operator 2 



Reagent Lot 1 



Reagent Lot 2 



Reagent Lot 3 




Basal Like 




HER2- 
enriched 




Luminal A 




Luminal B 1 




Luminal B 2 



X2 



Figure 1 Overview of the design for the RNA precision validation study. Five pooled breast tumor RNA samples were tested across severa 
sites, operators, reagent lots, and runs. 



Upon completion of the study the blinded data were 
collected from each site and merged with the expected 
test result and study variables (site, operator, reagent lot, 
etc.) associated with each unique sample ID. The pro- 
spectively defined analysis plan was then executed on 
the merged analysis dataset. 

RNA precision: statistical analysis 

The pre-specified primary aim of the RNA precision val- 
idation was to demonstrate that there was no significant 
differences for the continuous ROR score assay output 
across the three testing sites. 

The following variance components model was used 
to characterize the sources of variability: 

ROR Score = site + operator + lot + run + within-run 

where all components were treated as random compo- 
nents, and the RNA assay component of variation was 
defined as the sum of all these components. Variance 
components were estimated using the R procedure 
"lmer". To test whether sites were significantly different, 
the following versions of the above model were fitted: 

ROR Score = site + operator + lot + run + within-run 
& ROR Score = operator + lot + run + within-run 

where site was now treated as fixed and all other com- 
ponents were treated as random. A likelihood ratio test 



with 2-degrees of freedom was performed using the fit- 
ted models to determine whether the effect of site was 
significant (a = 0.05). A similar analysis was performed 
for the assay reagent lots. 

For each of the 5 pooled samples, the classifications 
into the 4 intrinsic subtype categories (Luminal A, Lu- 
minal B, Basal-Like, HER2-enriched) were summarized 
using frequency tables. 

Reproducibility: study design 

The reproducibility study assessed the analytical valid- 
ity of the Prosigna assay, including all steps involving 
in clinical lab implementation (i.e. tissue handling and 
RNA isolation SOPs as well as the device-specific 
assay steps), using a common set of breast cancer tis- 
sue samples. 

The experimental design for analytically validating the 
reproducibility from tissue was based on CLSI guidelines 
for the evaluation of precision of in vitro diagnostic de- 
vices outlined in CLSI EP05-A2. This design allows for 
the measurement of variability between and within a 
number of assay variables including testing site, FFPE 
sample block, operator, reagent lot, and assay run. 

A set of 43 banked FFPE breast tumor blocks from 
hormone receptor positive breast cancer patients with 
confirmed invasive breast carcinoma was selected from 
the biobank at Washington University at St. Louis for 
this reproducibility validation study. The sample collection 
and conduct of this study were conducted in compliance 
with the study protocols and local IRB procedures. One 
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FFPE block for each case was selected using the following 
criteria: 

1. Every case should represent a unique breast cancer 
patient 

2. All must be primary breast cancers 

3. All are pathology confirmed invasive ductal or 
lobular carcinoma, a mixtures of these types, or 
classified as no special type 

4. All are hormone receptor positive (ER + or PgR+) 
breast cancer 

5. All must have a recorded tumor size 

6. FFPE blocks should be < 10 years old 

7. A minimum of 10 cases each of > 100 mm 2 tumor 
area (1 slide/test) and 4 - 100 mm 2 tumor area 

(3 slides/test) 

The criterion that at least 10 cases contain > 100 mm 2 
and at least 10 cases contain 4-99 mm tumor area 
was implemented to validate the number of slides re- 
quired for the assay. The blocks were not prescreened 
with the assay prior to inclusion, but it was anticipated 
that the 43 samples would cover a broad range of ROR 
scores representative of the intended use population, 
including both node-negative and node-positive pa- 
tients, and each risk classification group. Seventeen tis- 
sue samples were from node-negative patients, 6 from 
node-positive patients and 20 were from patients whose 
regional lymph node status was provided by the bio- 
bank as NX. 



For reproducibility testing (Figure 2), three sets of seri- 
ally cut sections, each comprised of one H&E 4-micron 
stained slide and three 10-micron thick unstained slides, 
were prepared from each FFPE block. All cut and slide 
mounted sections were shipped to NanoString and then 
one set from each of the 43 blocks was distributed to the 
appropriate testing site for processing. All 43 specimens 
were reviewed independently by a separate pathologist for 
each of the three sites. 

For each tissue sample, a test run consisting of macro- 
dissection, RNA extraction, and testing with the Pro- 
signa assay was performed by a single operator at each 
site following the provided standard operating proce- 
dures. Each operator performed a minimum of four test 
runs consisting of up to 10 tissue samples per run. Each 
batch of tissue samples required a minimum run time of 
3 days from tissue processing to result. Isolated RNA 
that met the quantity and quality specifications from 
each of the slide mounted sections was tested twice in 
separate assay runs. Different lots of RNA isolation kit 
reagents were used at each site, and a single lot of the 
Prosigna assay kit was used at all three sites. 

The test results for all samples remained blinded to all 
personnel at all sites until the study was complete. Upon 
completion of the study the blinded Prosigna assay data 
were collected from each site and merged with the ex- 
pected test result and study variables (site, operator, re- 
agent lot, etc.) associated with each unique sample ID. 
The prospectively defined analysis plan was then exe- 
cuted on the merged analysis dataset. 



Site 1 



Pathologist 1 



Operator 1 



RNA Isolation 
Kit Lot 1 



RNA Run in 
Duplicate 
Reagent Lot 1 



Sample 1 - 43 



Site 2 



Pathologist 2 



Operator 2 



RNA Run in 
Duplicate 
Reagent Lot 1 



Site 3 



Pathologist 3 



Operator 3 



RNA Isolation 




RNA Isolation 


Kit Lot 2 




Kit Lot 3 



RNA Run in 
Duplicate 
Reagent Lot 1 



Figure 2 Overview of the design for the tissue reproducibility validation study. Tissue samples (1-43) were processed in parallel across 
different sites, pathologists, operators, and RNA isolation kits. 



Nielsen ef al. BMC Cancer 2014, 14:1 77 
http://www.biomedcentral.com/1471-2407/14/177 



Page 6 of 14 



Reproducibility: statistical analysis 

The pre-specified primary aim of the tissue reproducibil- 
ity validation was to demonstrate the Prosigna assay is 
highly reproducible, when combining all sources of vari- 
ation. For this study, "highly reproducible" was defined 
as a total standard deviation (SD) of less than 4.3 ROR 
units. The value of <4.3 was chosen because if two sam- 
ples have true ROR scores that differ by 10 units, a total 
SD of 4.3 means that 95% of the time the higher of the 
two will still have a higher individual observed ROR 
score. A change of 10 ROR units corresponds to an aver- 
age change in 10-year distant recurrence free survival of 
7% and 6% for node negative and node positive patients 
respectively [19]. 

The following variance components model was used 
to characterize the sources of variability: 

Measurement = FFPE Block + site + tissue section 
+ error 

where FFPE Block was treated as a fixed component, 
and site and section were treated as random compo- 
nents. The "site" term measured the systematic site- 
specific variation that was constant across all tissue 
samples (pathologist, technician, extraction kit). The tis- 
sue section component measures random variation that 
differed as a function of review/processing or within 
FFPE block variation. The error term was derived from 
the duplicate RNA samples and estimated the combin- 
ation of run-to-run and within-run variance. Variance 
components were estimated using the R procedure 
"lmer". In the above model, the variance components 
were estimated from a combined analysis of all FFPE 
blocks after verifying that were no systematic changes in 
tissue-specific variation as a function of ROR score. 

The tissue and RNA isolation components were esti- 
mated using the reproducibility validation and the assay 
components were estimated using the RNA precision 
validation. The total variability, of ofijZ , was calculated as: 



' total 0 tissue ' ^ RNA assay 



where aj issue was estimated as the sum of the site-to-site 
and section component estimated in the tissue reprodu- 
cibility study, and o\ NAassay was estimated as the total 
variation from the RNA precision study. 

Additional categorical analyses were performed using 
two classifications: 



3 risk-categories (low, intermediate, and high) using 
both the node-negative and node-positive cutoffs, 

4 intrinsic subtype categories (Luminal A, Luminal 
B, Basal-Like, HER2-enriched) 



RNA from each tissue sample was tested twice at each 
site so there are 4 possible comparisons between sites 
for each tissue sample leading to a total number of pos- 
sible comparisons of 4*number of tissue samples. For 
each of the two classification schemes (risk category or 
subtype), the pair-wise concordance between sites was 
estimated as the fraction of all possible comparisons that 
were concordant and an exact-type 95% confidence 
interval was calculated. 

In addition, a post hoc analysis compared the normal- 
ized gene expression from the 50 classifier genes be- 
tween the tissue replicates from all valid specimens 
tested at each site using a linear regression and correl- 
ation analysis 

RNA input: study design 

Thirteen FFPE breast tumor blocks containing pathologically- 
confirmed infiltrating ductal carcinoma were obtained and 
RNA was extracted from multiple slide mounted tissue 
sections from each block using the defined procedure 
(Figure 3). The individual RNA isolates from each FFPE 
block were pooled. Each pooled tumor RNA sample was 
tested in duplicate across three RNA input levels within 
the assay specification range (500, 250, and 125 ng) and in 
singlet at two additional RNA input levels outside of the 
specification range (625, 62.5 ng). Two no-target (water) 
measurements were also tested in duplicate on every run. 
All tumor RNA samples were assumed to be node- 
negative with a tumor size of < 2 cm for this analytical 
study since these clinical covariates have no impact on the 
measured variation in the ROR score. All samples were 
tested using two different Prosigna assay reagent lots. 

RNA input: statistical analysis 

The pre-specified primary aim of the RNA input study 
was to demonstrate the Prosigna assay results were 



Sample 1-13 






Site 1 




Single Pathologist & 




Operator) 






I 






RNA Isolation 






Kit Lot 1 






625 ng 




500 ng x 2 




250 ng x 2 




125 ngx2 




62.5 ng 




0ngx2 



Figure 3 Overview of the design for the RNA input study. RNA 

from 13 tissue samples was tested across and beyond the RNA input 
range specified for the assay. 
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unchanged at the extremes of the assay specification 
range (125 and 500 ng RNA) regardless of the assay re- 
agent kit lot used. For each kit lot, the test statistic was 
the average difference between the mean ROR score at a 
given input level (RORlj) and the mean ROR score at 
the nominal level (RORnj): 

I » 

Average Difference = - y (RORy- ROR N j) 
n /=i 

where the average is across the n different samples. In 
this equation, RORnj is the average of two replicates at 
the nominal level and RORlj s the average of two repli- 
cates for input levels within specification, or is the single 
result for input levels outside of specification. Equiva- 
lence was pre-defined as an observed absolute average 
ROR difference significantly less than 3. To test the non- 
equivalence hypothesis that the true absolute mean dif- 
ference is greater than 3, a 90% confidence interval for 
the difference was calculated. This 90% confidence inter- 
val corresponds to the two one-sided test approach for 
bioequivalence [31]. The input level was determined to 
be equivalent to the nominal level if the 90% confidence 
interval is completely contained within -3 and 3. 

For each pooled sample a linear regression and correl- 
ation analysis was also performed between each replicate 
at each RNA input level and one of the two replicates 
run at 250 ng of RNA. The difference in the ROR score 
(AROR) from the nominal RNA input level (250 ng) for 
each replicate at each RNA input level was calculated by 
subtracting the ROR score calculated from one of the 
two replicates run at 250 ng from ROR scores calculated 
at the other input levels. Additionally, the AROR was 
calculated and linear regression and correlation analyses 
were also performed between the two replicates at 
250 ng. The mean AROR, slope, intercept, and correl- 
ation values (with 95% confidence intervals) were calcu- 
lated using the pairwise comparisons for all passing 
samples at each input level for both kit lots. 

For the no-target (water) samples, the percentage of 
samples failing the minimum threshold for expression of 
normalizing genes was calculated. All no-target samples 
were required to give a failing test result. 

Tissue interferents: study design and analysis 

Twenty three FFPE breast tumor blocks were obtained 
containing pathologically-confirmed infiltrating ductal 
carcinoma microscopically-assessed to have 10 - 95% of 
the total tissue area containing normal/non-tumor tis- 
sue. Pathologists identified additional tumor interferents 
(DCIS, necrotic tissue, or blood/hemorrhagic tissue) 
within or near the margins of the tumor in ten of the 23 
blocks. 



For each FFPE breast tumor block, H&E stained slides 
were prepared and up to nine unstained sections were 
cut and mounted on slides. For the inclusion of the 
interferent, the sections were processed according to the 
assay procedure with the exception that identified nor- 
mal/non-tumor tissue or any additional interferents were 
included in the isolation ("non-macrodissected slides"). 
For the macrodissection where the non-tumor and other 
interferents were removed, three or (in the case of small 
tumor surface areas) three and six slides were processed 
according to the Prosigna assay protocol. 

The change in ROR (AROR) due to the interferent 
was calculated using the ROR score from the non- 
macrodissected slides minus the ROR score from the 
macrodissected slides (Figure 4). For the tissue blocks 
where three and six macrodissected slides were inde- 
pendently isolated and both produced a passing assay re- 
sult, the average of the two ROR scores were used to 
calculate the AROR. 

Results 

RNA precision: variance components analysis 

The precision of the Prosigna assay starting from RNA was 
assessed with 5 pooled breast tumor RNA samples each 
tested 36 times at each of the three sites. There were no in- 
dividual test samples that failed the pre-specified data QC 
metrics in the software so the analysis includes 540 results 
from 54 valid runs. For all five tumor RNA samples, the 
total SD was less than 1 ROR unit on a 0 - 100 scale 
(Table 2), and there was 100% concordance between mea- 
sured subtype result and expected subtype result as well as 
measured and expected risk group. More than 60% of 
the measured variability came from within-run variance 
(repeatability) while less than 2% of the variance was 
attributable to site-to-site variance or operator-to-operator 
variance. The differences in mean ROR scores between sites 
were less than 0.5 ROR units on a 0-100 scale and were 
insignificant for all tested samples (Additional file 1: 
Table SI). The contribution to overall variance by the three 
reagent lots was approximately 20% of the total variance on 
average, but the differences were all less than 1 ROR unit. 
At each site, the normalized gene expression between RNA 
replicates was highly correlated with slopes ranging from 
0.98 - 1.00, intercepts at 0, and r values of 0.99. 

The distribution of measured ROR scores for each of 
the five pooled RNA samples was also examined across 
the three lots, six users and three test sites. The range of 
ROR scores for the 108 independent measurements was 
<4 units for each of the 5 sample pools (Figure 5). 

Reproducibility: test sample quality control and 
characterization 

The call rate for the 43 tissue samples evaluated was 
95%, 93%, and 100% for sites 1, 2, and 3 respectively. 
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Figure 4 Overview of tissue processing for assessing the effect of tissue interferents. Multiple sections from FFPE breast tumor blocks were 
mounted onto slides and processed with or without macrodissection. The change in ROR score (AROR) is calculated as the ROR score from the 
non-macrodissected slides minus the ROR score from the macrodissected slides (or in the illustration AROR = 25 - 30 = -5). 



Forty samples yielded results at all sites (RNA isolation 
of one sample at one site required repeating). One tissue 
sample yielded results at 2 sites, and 2 samples yielded 
results at a single site, while the other sites did not ob- 
tain sufficient RNA to perform the assay for these sam- 
ples. The measured tumor surface area for 4/5 RNA 
isolation failures was very small (< 15 mm 2 ). One hun- 
dred percent (100%) of samples passing tissue review 
and RNA isolation specifications yielded passing results 
from the Prosigna assay. 

The calculated test results from the 43 tissues 
across all sites represent a wide range (94 units) of 
ROR scores (Figure 6) and all risk categories when 
applying the node-negative or node-positive ROR 
score cutoffs to all samples. All four intrinsic subtypes 
were also represented among the 43 specimens. The 
two samples where RNA could only be successfully 
isolated at one site were excluded from all subsequent 
statistical analysis as there was no available data for 
comparing across sites. Both of these samples had 



ROR scores of less than 10 and were classified as 
Luminal A. 

Reproducibility: variance components analysis 
(primary objective) 

Table 3 shows the results of the variance components ana- 
lysis using all 41 tissue specimens where replicate measure- 
ments were available. The "tissue section" variation, which 
consists of variation contributed by within FFPE block sec- 
tions, pathology review, and tissue processing, was the 
dominant source of variation (> 90% of total variance). 
The differences on average between the sites were negligible 
(< 1% of total variance). The combined run-to-run variabil- 
ity and within-run variability in the assay (determined from 
the duplicate measurements from each RNA isolation from 
the reproducibility study) was consistent with the variability 
measured in the RNA-precision study (variance of 0.51 
compared to 0.47 for the RNA-precision study). 

The total SD including all source of variation (tissue 
and RNA processing variability) was 2.9 indicating that 



Table 2 Variance components for the five pooled RNA samples across 108 replicates 



Pooled RNA 
sample 


Mean ROR 
score 




Variance component (%) 




Total 
variance 


Total 
SD 


Reagent lot 


Site 


Operator 


Run 


Within-run 


Basal-like 


55.4 


0.059 (20%) 


0.000 (0%) 


0.000 (0%) 


0.046 (15%) 


0.194 (65%) 


0.299 (100%) 


0.55 


HER2-enriched 


76.2 


0.165 (37%) 


0.000 (0%) 


0.000 (0%) 


0.000 (0%) 


0.277 (63%) 


0.442 (100%) 


0.66 


Luminal A 


31.4 


0.010 (2%) 


0.000 (0%) 


0.000 (0%) 


0.134 (30%) 


0.296 (67%) 


0.44 (100%) 


0.66 


Luminal B 1 


55.0 


0.105 (18%) 


0.000 (0%) 


0.000 (0%) 


0.046 (8%) 


0.426 (74%) 


0.576 (100%) 


0.76 


Luminal B 2 


64.8 


0.119 (21%) 


0.014 (2%) 


0.000 (0%) 


0.064 (1 1 %) 


0.380 (66%) 


0.576 (100%) 


0.76 



The percent of total variance is listed below the estimated variance. 
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Figure 5 Distribution of 108 ROR scores measured for each of the 5 Pooled RNA samples. Boxplots show the distribution of ROR scores 
relative to the 0-100 range and the histograms show the frequency of the measured ROR scores on a 20-point range. Boxplots and histograms 
are colored by the intrinsic subtype result for each sample. 



the Prosigna assay can measure a difference between two 
ROR scores of 6.75 with 95% confidence. 

Reproducibility: subtype and risk category classifications 
concordance 

The site-to-site concordances for the two categorical 
classifications are shown in Table 4, in each case with 




0 10 20 30 40 50 60 70 80 90 100 



ROR Score Category 
(Tissue Average) 

Figure 6 Reproducibility of the ROR score in the tissue 
reproducibility study. Average tissue block ROR compared to the 
individual ROR score for all samples. Data are colored by the intrinsic 
subtype result. The high, intermediate, and low node negative risk 
categories are shown to the right of the figure with the risk thresholds 
shown as lines in the body of the figure. 



exact-type 95% confidence intervals. For each compari- 
son (subtype and node negative and positive risk cat- 
egories), the average concordance between sites was at 
least 90%. There were no samples where the risk cat- 
egory changed from low risk to high risk (or vice versa) 
between or within sites when the samples were assumed 
to be from node negative patients. There were only two 
intermediate/high risk samples that did not give identical 
subtypes across all 6 replicates: 

• One sample had duplicate Luminal A results at one 
site and duplicate Luminal B results at each of the 
other two sites. 

• One specimen had duplicate Luminal A results at 
one site, duplicate HER2-enriched results at another 
site and one each of Luminal A and HER2-enriched 
at the third site. 

Reproducibility: pairwise correlation coefficients of gene 
expression 

The average intercept, slope, and Pearson's correlation 
of the pair-wise comparisons between sites are reported 



Table 3 Total variability (from tissue and RNA processing) 
of the Prosigna assay 



Tissue processing variability 


RNA processing 


Total 


Total 


Site Within block/process 


variability 


variability 


SD 


0.10 7.72 


0.47 


8.29 


2.9 



The total SD of 2.9 is on a 0-100 ROR scale. 
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Table 4 Concordance of subtype calls and risk categories between the three sites 



Comparison 
type 




Pairwise Concordance [95% CI] 




Average 
concordance 


Site 1 vs. Site 2 
(n = 40) 


Site 1 vs. Site 3 
(n = 41) 


Site 2 vs. Site 3 
(n = 40) 


Subtype 


96.3% 


98.8% 


95% 


97% 




P6.496-99.596] 


[9 1.0%- 100%] 


[83.196-993%] 




Risk Category 


87.5% 


92.7% 


90% 


90% 


(Node Negative) 


[73.2%-95.8%] 


[80.196-98.496] 


[76.4%-97.2%] 




Risk Category 


90.0% 


95.1% 


95.0% 


93% 


(Node Positive) 


[76.9%-96.0%] 


[83.9%-98.7%] 


[83.5%-98.6%] 





The pairwise (site to site) concordance is reported with the 95% confidence interval. 



with the 95% confidence interval (Table 5). The gene ex- 
pression between tissue replicates was highly correlated 
between sites with slopes ranging from 0.97 - 1.00, in- 
tercepts at 0, and r values of 0.98 or greater. Equivalent 
or higher correlation values were observed when a simi- 
lar analysis was performed for the RNA replicates tested 
at each site (Additional file 2: Table S2). Additionally, 
hierarchical clustering analysis demonstrated that tissue 
sample and RNA sample replicates were always and only 
clustered together across a wide range of expression in 
each of the 50 genes across all samples tested (Additional 
file 3: Figure SI). 

RNA input: test sample quality control 

The average ROR score for the tested samples covered a 
broad range (20 - 82) and all intrinsic subtypes - in- 
cluding 5 Luminal A, 4 Luminal B, 3 HER2-enriched 
and 1 Basal-like sample (Additional file 4: FigureS2). 
One FFPE block was tested with a single kit lot due to 
insufficient RNA mass from the isolation for the second 
lot. Two runs (each with different samples) failed to pro- 
vide passing results for one of the two lots tested due to 
a processing error detected by system controls with 

Table 5 Site to site gene expression comparisons from 
the tissue reproducibility study 



Comparison 


Pairwise (n) 


Intercept 
[95% CI] 


Slope 
[95% CI] 


Pearson 
[95% CI] 


All Sites 


121 


0.00 


0.98 


0.98 






[-0.01-0.01] 


[0.97-0.99] 


[0.98-0.98] 


Site 1 vs. Site 2 


40 


0.00 


0.97 


0.98 






[-0.01-0.01] 


[0.95-0.98] 


[0.97-0.98] 


Site 1 vs. Site 3 


40 


0.01 


1.00 


0.98 






[0-0.02] 


[0.98-1.01] 


[0.98-0.99] 


Site 2 vs. Site 3 


41 


-0.01 


0.99 


0.99 






[-0.02-0] 


[0.97-1] 


[0.98-0.99] 



Pairwise correlations, slopes, and intercepts of normalized 50 genes for tissues 
replicates from the tissue reproducibility study. The average intercept, slope, 
and Pearson's correlation of the pair-wise comparisons are reported with their 
95% confidence intervals. 



insufficient RNA to repeat the assay. All measured no- 
target samples (n = 46) were well below the threshold for 
signal and yielded a failing test result (0% call rate). All 
tumor RNA measurements within assay specification (n = 
138) yielded a passing test result (100% call rate). One 
hundred percent (100%) of specimens with input above 
specification (625 ng) yielded a passing test result. Eighty- 
three percent (83%) of specimens (10/12) tested at input 
below specification (62.5 ng) yielded a test result in lot 1, 
as did 100% in lot 2. 

RNA input: ROR score difference and pairwise correlation 
coefficients of gene expression 

For each of the two reagent lots tested, the confidence 
interval around the mean ROR score difference between 
the nominal input and the RNA input limits (125 and 
500 ng) were completely contained within -3 and 3 ROR 
units. The ROR scores at 125 and 500 ng RNA were 
therefore equivalent to those at the target input concen- 
tration of 250 ng for each of the two reagent kit lots tested 
meeting the primary objective of the study. Of note, when 
characterizing the RNA levels outside of the assay specifi- 
cation, the ROR scores at 62.5 ng RNA were not equiva- 
lent (with an upper confidence limit at 3.26) to those at 
the target input concentration of 250 ng for one of the 
two lots tested. This illustrates the importance of perform- 
ing the assay according to the defined procedure. 

When the lots were combined the normalized gene ex- 
pression values and ROR scores were consistent to those 
at the target input concentration of 250 ng within and 
even outside the RNA input limit specifications (Table 6). 
Characterization of intrinsic subtype across the samples 
tested shows a 100% concordance in subtype call across 
all samples and inputs. Similarly, there is a 100% concord- 
ance by risk classification across all samples and inputs. 

Tissue interferents: test sample quality control 

Out of 23 samples six were Luminal A, seven were Lu- 
minal B, two were HER2-enriched, and eight were Basal- 
like. The average ROR score for the 23 samples covered a 
broad range (10 - 83), (Additional file 5: Figure S3). 
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Table 6 Comparison of gene expression at different masses from the RNA input study 


Mass (ng) 


Pairwise (n) 


Pearson 


Slope 


Intercept 


AROR 






[95% CI] 


[95% CI] 


[95% CI] 


[95% CI] 


62.5 


21 


0.97 [0.93-0.99] 


0.96 [0.91-1.00] 


-0.02 [-0.05-0.01] 


0.48 [-1.27-2.22] 


125 


46 


0.99 [0.97-0.99] 


0.98 [0.96-1.01] 


-0.01 [-0.03-0.01] 


-0.04 [-0.89-0.8] 


250 


23 


0.99 [0.98-1.00] 


1.00 [0.99-1.01] 


0.00 [-0.01-0.01] 


-0.39 [-0.96-0.17] 


500 


46 


0.99 [0.99-1.00] 


0.97 [0.96-0.99] 


0.02 [0.01-0.04] 


-0.57 [-1.39-0.26] 


625 


23 


0.99 [0.98-1.00] 


0.95 [0.92-0.99] 


0.03 [0.01-0.06] 


-0.78 [-2.2-0.63] 



Pairwise correlations, slopes, and intercepts of normalized 50 genes and change in ROR score for replicate RNA Hybridizations with different mass inputs. The 
average intercept, slope, Pearson's correlation, and change in ROR for the pair-wise comparisons are reported with their 95% confidence intervals. 



Tissue interferents: impact on ROR score 

As the amount of adjacent non-tumor tissue increases 
there is an increasing risk that the reported ROR score 
will be an underestimate or negatively biased (up to -19 
ROR score units for samples containing 95% non-tumor 
tissue) estimate of a patient's risk of recurrence (Figure 7). 
Elimination of the macrodissection step required by the 
assay also caused a change in subtype determination for 
five out of 23 samples. Three Luminal B samples, one 
HER2-enriched, and one basal-like sample were classified 
as Luminal A due to inclusion of adjacent non-tumor tis- 
sue. In contrast, the presence of intratumor hemorrhage, 
necrosis or DCIS (not removed by macrodissection) had 
little effect on ROR. 

Discussion 

Breast cancer gene expression testing has been the sub- 
ject of many studies demonstrating its capacity to strat- 
ify breast cancers by prognostic risk [9,15,16,32,33]. 
Increasingly, studies are also showing the value of such 
signatures to predict response to therapy, for example by 
using these tests to evaluate archival specimens from 
randomized clinical trials [34-36]. The integration of 
molecular genomic testing into cancer care is an active 
area of development, with huge genomic datasets be- 
coming available. Great improvements in experimental 
design and bioinformatic analysis have led to the devel- 
opment of robust signatures ripe for translation into 
clinical tests. Studies applying these signatures to differ- 
ent clinical series with observational, case-control, co- 
hort and randomized trial designs have generated 
increasingly strong evidence for clinical validity, particu- 
larly in breast cancer [19,34,35,37]. It is in this backdrop 
that the Evaluation of Genomic Applications in Practice 
and Prevention (EGAPP) working group was formed to 
guide best practices in experimental design and the in- 
terpretation of evidence for utility in clinical practice 
[23]. Fundamental to EGAPP criteria is the concept that 
clinical utility requires not only clinical validity (linking 
test results to clinical presentation, treatment and out- 
come), but just as importantly, analytical validity (the 
capacity of the test classifier to be sensitive, specific and 



reproducible in practice). However, EGAPP found that 
relatively few studies of breast cancer molecular classifiers 
have directly reported on analytical reproducibility [38]. 

Analytical reproducibility is a requirement for the im- 
plementation of all diagnostic tests, but it is especially 
critical for decentralized tests given the challenges of 
maintaining reproducibility across pathologists, technical 
operators, and instrumentation. However, decentralized 
tests also have many advantages over Laboratory Devel- 
oped Tests that are performed at single central labora- 
tories. By avoiding the need for shipping tissues, 
turnaround times and costs are reduced. The capacity 
for the laboratory physician to interact directly with the 
treating physician greatly aids medical care, for example in 
facilitating appropriate prioritization of critical specimens, 
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Figure 7 Effect of non-tumor tissue on the ROR score. The 

impact of including adjacent non-tumor tissue on ROR was assessed 
by determining the change in test results from slide mounted 
sections with vs. without macrodissection of adjacent non-tumor 
tissue. Data colors represent if the interferent was only normal/non- 
tumor tissue or if additional non-tumor interferents (DCIS, necrotic 
tissue, or blood/hemorrhagic tissue) were identified within or near 
the margins of the tumor. 
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explaining equivocal or unexpected results, and quickly 
recognizing inadequate specimens and what can be done 
to get a result helpful to the patient as soon as possible. 
Although first generation breast cancer prognostic tests 
were performed in central labs [32], second generation 
tests are being developed and validated to realize the ad- 
vantages of decentralized testing [39]. 

The Prosigna assay was tested across a range of RNA 
mass inputs that is consistent with what will be expected 
in a clinical lab setting. The assay is robust across that 
range, similar to what has been reported with other mul- 
tigene breast cancer tests [32,39]. Additionally, the assay 
gave consistent results outside the specified assay RNA 
input limits; only 2 samples failed to produce passing re- 
sults at half the lowest specified mass further illustrating 
the robustness of the assay. 

The observation of biased subtype calls and ROR scores 
with the inclusion of non-tumor tissue is consistent with a 
prior study [40], however the bias reported herein is less 
severe. Similar to what is expected to be experienced in 
clinical practice, the interferent being measured here is re- 
ported as percent adjacent non-tumor tissue included, ra- 
ther than percent non-tumor RNA from a separate paired 
normal tissue sample reported in the earlier study. Normal 
breast tissue yields less total RNA compared to tumor tis- 
sue [41] and adjacent non-tumor tissue at the margins of 
the tumor have certain cancer pathways activated where 
matched healthy breast does not [42,43] . Nonetheless, this 
study illustrates the importance of performing the 
macrodissection according to the defined procedure to 
maximize the accuracy of the test. 

The precision and reproducibility of the Prosigna 
assay, estimated from repeat measurements of pooled 
tumor RNA sample(s) and de-identified patient tissue 
samples across multiple testing sites is similar (relative 
to the overall test range) to what was previously re- 
ported for centralized lab tests [32,38]. These results 
demonstrate that the Prosigna assay is analytically repro- 
ducible even when performed at multiple test sites and 
including all process variables. It will be important for 
local labs to verify the reproducibility reported herein 
when implementing this decentralized assay to ensure the 
quality of the test's results, including ongoing process 
monitoring [38]. 

Our experience of implementing the nCounter plat- 
form in our CLIA-certified hospital laboratory environ- 
ments proved to be straightforward, confirming the 
simplicity of the assay and its suitability as an in vitro 
diagnostic test. Training of the assay workflow (includ- 
ing tissue macrodissection, RNA isolation and setup of 
Prosigna assay) takes less than one week. The pre- 
specified SOPs are easy to follow and the procedure of 
RNA extraction and Prosigna assay are straightforward. 
All operators, most of whom were naive users to the 



Prosigna assay, were able to pass the training procedures 
on the first attempt, before executing the pre-specified 
study protocols. Although overnight incubations are re- 
quired during RNA extraction and RNA - probe 
hybridization, the incubation temperature is constant, and 
hands-on time requirements for the whole experiment are 
very limited. Furthermore, the analyses for subtype call 
and ROR score are simplified and controlled by integrat- 
ing the algorithm into the software for raw data process- 
ing, reducing the potential for human error in data 
cleaning and analysis. 

Conclusion 

The FDA cleared and CE marked Prosigna assay based on 
the PAM50 gene expression signature has recently been 
shown to predict the risk of distant recurrence in women 
with hormone receptor positive early stage breast cancer 
treated with five years of endocrine therapy [19,20]. This 
demonstration of analytical reproducibility generates a 
strong body of evidence supporting the decentralized use 
of this test as a tool for breast cancer risk stratification. 
Additional ongoing studies of the clinical validity of the 
PAM50 gene expression signature for chemosensitivity 
prediction [34-36] could, if confirmed, be considered clin- 
ically actionable given the demonstrated analytical validity 
of this test. 

Additional files 



Additional file 1: Table SI. Site to site ROR sample means. Mean ROR 
scores were calculated for each pooled RNA sample, and likelihood ratio 
test for significance was performed to test for differences between sites. 
There were no significant differences in the results observed across sites 
for the five pooled RNA samples tested. All p-values were well above 0.05 
for the likelihood ratio test of significance of site with 2 degrees of 
freedom for each pooled RNA sample. The differences in means between 
sites were all less than 0.5 ROR units on a 0-100 scale. 

Additional file 2: Table S2. Within site gene expression comparisons 
from the tissue reproducibility study. Pairwise correlations, slopes, and 
intercepts of normalized 50 genes for replicate RNA Hybridizations from 
the tissue reproducibility study. The average intercept, slope, and 
Pearson's correlation of the pair-wise comparisons are reported with the 
95% confidence interval. 

Additional file 3: Figure SI. Hierarchical clustering of all samples from 
the tissue reproducibility study. Clustering analysis (using a Pearson's 
distance metric and average linkage) was performed on the median 
centered normalized, Log2 transformed and scaled sample data to 
further characterize the gene expression in the tissue samples. The tissue 
sample and RNA sample replicates were always only clustered together 
and the node heights are almost imperceptibly low (indicating highly 
correlated gene expression). 

Additional file 4: Figure S2. Average ROR Score for the 1 3 unique 
tumor RNA samples within the RNA Input Study. Data are colored by the 
intrinsic subtype result at 250 ng of RNA. 

Additional file S: Figure S3. ROR Score for the 23 unique 
macrodissected tumor samples. Data are colored by the intrinsic subtype 
result for each tissue. For tissues with multiple isolations the subtype result 
illustrated was from the macrodissection with the most number of slides 
processed. 
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